Fix resolve token conflict logic under race condition#7554
Open
alexqyle wants to merge 3 commits into
Open
Conversation
Signed-off-by: Alex Le <leqiyue@amazon.com>
…tegy" This reverts commit aff3b18. Signed-off-by: Alex Le <leqiyue@amazon.com>
Signed-off-by: Alex Le <leqiyue@amazon.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What this PR does:
Inside ring module, there is logic to check conflict/duplicated tokens and resolve the conflict. However, during race condition that two instances joined the ring at same time, this logic will not detect conflicts during the first CAS cycle because none of new instances appeared in the ring. On the following CAS calls, none of those instance would do conflict check again because it did not detect any token changed.
Also, there was another issue in the original code. The issue was that even resolve token function made the correct operation. Token from instances that were not detected as newly added or updated will not be updated with new tokens even they have conflicts.
This change fixs both issues in the code. With the fix, each CAS call would do conflict check. It takes O(n*m) where n is number of instances in ring and m is number of tokens per instance. So the check should not add too much overhead for each CAS call. Also, add logic to make sure all instances got updated token during resolving conflict process would be properly updated. And this update should be idempotent to avoid updates around the same time produce the same result.
Which issue(s) this PR fixes:
NA
Checklist
CHANGELOG.mdupdated - the order of entries should be[CHANGE],[FEATURE],[ENHANCEMENT],[BUGFIX]docs/configuration/v1-guarantees.mdupdated if this PR introduces experimental flags